Overview

Dataset Statistics

Number of Variables 12
Number of Rows 96095
Missing Cells 1275
Missing Cells (%) 0.1%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 38.8 MB
Average Row Size in Memory 423.3 B
Variable Types
  • Categorical: 6
  • Numerical: 6

Dataset Insights

total_payment_sum is skewed Skewed
geolocation_lat is skewed Skewed
geolocation_lng is skewed Skewed
avg_score is skewed Skewed
frequency is skewed Skewed
customer_unique_id has a high cardinality: 96095 distinct values High Cardinality
most_recent_order has a high cardinality: 95833 distinct values High Cardinality
first_order has a high cardinality: 95836 distinct values High Cardinality
customer_city has a high cardinality: 4119 distinct values High Cardinality
customer_unique_id has constant length 32 Constant Length
most_recent_order has constant length 19 Constant Length
first_order has constant length 19 Constant Length
customer_state has constant length 2 Constant Length
customer_unique_id has all distinct values Unique
geolocation_lat has 95718 (99.61%) negatives Negatives
geolocation_lng has 95826 (99.72%) negatives Negatives
frequency has 93355 (97.15%) zeros Zeros
  • 1
  • 2

Variables


customer_unique_id

categorical

Approximate Distinct Count 96095
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 9321215

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 0000366f3b9a7992bf...
2nd row 0000b849f77a49e4a4...
3rd row 0000f46a3911fa3c08...
4th row 0000f6ccb0745a6a4b...
5th row 0004aac84e0df4da2b...

Letter

Count 1153001
Lowercase Letter 1153001
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1922039
  • customer_unique_id contains many words: 96095 words
  • customer_unique_id has words of constant length

total_payment_sum

numerical

Approximate Distinct Count 28458
Approximate Unique (%) 29.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1537520
Mean 166.5942
Minimum 0
Maximum 13664.08
Zeros 2
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • total_payment_sum is skewed right (γ1 = 9.467)

Quantile Statistics

Minimum 0
5-th Percentile 32.69
Q1 63.12
Median 108
Q3 183.53
95-th Percentile 476.152
Maximum 13664.08
Range 13664.08
IQR 120.41

Descriptive Statistics

Mean 166.5942
Standard Deviation 231.4289
Variance 53559.3413
Sum 1.6009e+07
Skewness 9.467
Kurtosis 243.3038
Coefficient of Variation 1.3892
  • total_payment_sum is not normally distributed (p-value 7.460836565388573e-25)
  • total_payment_sum has 7656 outliers

most_recent_order

categorical

Approximate Distinct Count 95833
Approximate Unique (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory Size 8071980

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2018-05-10 10:56:2...
2nd row 2018-05-07 11:11:2...
3rd row 2017-03-10 21:05:0...
4th row 2017-10-12 20:29:4...
5th row 2017-11-14 19:45:4...

Letter

Count 0
Lowercase Letter 0
Space Separator 96095
Uppercase Letter 0
Dash Punctuation 192190
Decimal Number 1345330
  • most_recent_order contains many words: 50710 words
  • The largest value (20171124) is over 2.36 times larger than the second largest value (20171125)
  • most_recent_order has words of constant length

first_order

categorical

Approximate Distinct Count 95836
Approximate Unique (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory Size 8071980

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2018-05-10 10:56:2...
2nd row 2018-05-07 11:11:2...
3rd row 2017-03-10 21:05:0...
4th row 2017-10-12 20:29:4...
5th row 2017-11-14 19:45:4...

Letter

Count 0
Lowercase Letter 0
Space Separator 96095
Uppercase Letter 0
Dash Punctuation 192190
Decimal Number 1345330
  • first_order contains many words: 50706 words
  • The largest value (20171124) is over 2.39 times larger than the second largest value (20171125)
  • first_order has words of constant length

geolocation_lat

numerical

Approximate Distinct Count 14825
Approximate Unique (%) 15.5%
Missing 269
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 1533216
Mean -21.1811
Minimum -36.6054
Maximum 42.184
Zeros 0
Zeros (%) 0.0%
Negatives 95718
Negatives (%) 99.6%
  • geolocation_lat is skewed right (γ1 = 1.6695)

Quantile Statistics

Minimum -36.6054
5-th Percentile -28.5265
Q1 -23.5882
Median -22.9259
Q3 -20.1251
95-th Percentile -7.7798
Maximum 42.184
Range 78.7894
IQR 3.4631

Descriptive Statistics

Mean -21.1811
Standard Deviation 5.6339
Variance 31.741
Sum -2.0297e+06
Skewness 1.6695
Kurtosis 3.6147
Coefficient of Variation -0.266
  • geolocation_lat is not normally distributed (p-value 4.1474278892858213e-23)
  • geolocation_lat has 15686 outliers

geolocation_lng

numerical

Approximate Distinct Count 14826
Approximate Unique (%) 15.5%
Missing 269
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 1533216
Mean -46.1733
Minimum -72.6667
Maximum -8.5779
Zeros 0
Zeros (%) 0.0%
Negatives 95826
Negatives (%) 99.7%
  • geolocation_lng is skewed right (γ1 = 0.0464)

Quantile Statistics

Minimum -72.6667
5-th Percentile -52.3674
Q1 -48.1104
Median -46.6311
Q3 -43.5993
95-th Percentile -38.4988
Maximum -8.5779
Range 64.0889
IQR 4.5111

Descriptive Statistics

Mean -46.1733
Standard Deviation 4.0692
Variance 16.5587
Sum -4.4246e+06
Skewness 0.04641
Kurtosis 2.3632
Coefficient of Variation -0.08813
  • geolocation_lng is not normally distributed (p-value 3.355054786794859e-17)
  • geolocation_lng has 4015 outliers

avg_score

numerical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 737
Missing (%) 0.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 1525728
Mean 4.0848
Minimum 1
Maximum 5
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • avg_score is skewed left (γ1 = -1.3621)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 4
Median 5
Q3 5
95-th Percentile 5
Maximum 5
Range 4
IQR 1

Descriptive Statistics

Mean 4.0848
Standard Deviation 1.3472
Variance 1.815
Sum 389514
Skewness -1.3621
Kurtosis 0.5035
Coefficient of Variation 0.3298
  • avg_score is not normally distributed (p-value 7.264850719882165e-22)
  • avg_score has 14005 outliers

customer_state

categorical

Approximate Distinct Count 27
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6438365
  • The largest value (SP) is over 3.25 times larger than the second largest value (RJ)

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row SP
2nd row SP
3rd row SC
4th row PA
5th row SP

Letter

Count 192190
Lowercase Letter 0
Space Separator 0
Uppercase Letter 192190
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SP, RJ) take over 50.0%
  • The largest value (sp) is over 3.25 times larger than the second largest value (rj)
  • customer_state has words of constant length

customer_city

categorical

Approximate Distinct Count 4119
Approximate Unique (%) 4.3%
Missing 0
Missing (%) 0.0%
Memory Size 7240068
  • The largest value (sao paulo) is over 2.26 times larger than the second largest value (rio de janeiro)

Length

Mean 10.3428
Standard Deviation 4.0003
Median 9
Minimum 3
Maximum 32

Sample

1st row cajamar
2nd row osasco
3rd row sao jose
4th row belem
5th row sorocaba

Letter

Count 921191
Lowercase Letter 921191
Space Separator 72256
Uppercase Letter 0
Dash Punctuation 222
Decimal Number 2
  • customer_city contains many words: 3286 words

customer_zip_code_prefix

numerical

Approximate Distinct Count 14986
Approximate Unique (%) 15.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1537520
Mean 35186.2848
Minimum 1003
Maximum 99990
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • customer_zip_code_prefix is skewed right (γ1 = 0.776)

Quantile Statistics

Minimum 1003
5-th Percentile 3318.7
Q1 11390
Median 24440
Q3 59033.5
95-th Percentile 90540
Maximum 99990
Range 98987
IQR 47643.5

Descriptive Statistics

Mean 35186.2848
Standard Deviation 29800.5042
Variance 8.8807e+08
Sum 3.3812e+09
Skewness 0.776
Kurtosis -0.7931
Coefficient of Variation 0.8469
  • customer_zip_code_prefix is not normally distributed (p-value 6.926910929659394e-07)

count

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6342271
  • The largest value (1) is over 33.92 times larger than the second largest value (2)

Length

Mean 1
Standard Deviation 0.003226
Median 1
Minimum 1
Maximum 2

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 96096
  • The top 2 categories (1, 2) take over 50.0%
  • The largest value (1) is over 33.92 times larger than the second largest value (2)

frequency

numerical

Approximate Distinct Count 2313
Approximate Unique (%) 2.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1537520
Mean 1.2815
Minimum 0
Maximum 304.4895
Zeros 93355
Zeros (%) 97.2%
Negatives 0
Negatives (%) 0.0%
  • frequency is skewed right (γ1 = 12.3511)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 0
Maximum 304.4895
Range 304.4895
IQR 0

Descriptive Statistics

Mean 1.2815
Standard Deviation 12.0118
Variance 144.2836
Sum 123142.8314
Skewness 12.3511
Kurtosis 177.673
Coefficient of Variation 9.3735
  • frequency is not normally distributed (p-value 4.22930354328315e-25)
  • frequency has 2740 outliers

Interactions

Correlations

Missing Values